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We perform a search for the rare decay B^ — >■ fJ.'^ using data collected by the DO experiment at 
the Fermilab Tevatron Collider. This result is based on the full DO Run II dataset corresponding to 
10.4 fb"'^ of collisions at -^s — 1.96 TeV. We use a multivariate analysis to increase the sensitivity 
of the search. In the absence of an observed number of events above the expected background, we 
set an upper limit on the decay branching fraction of B{Bg — > fi^ < 15 x 10~® at the 95% C.L. 

PACS numbers: 13.20.He,14.40.Nd 



I. INTRODUCTION 

The rare decay B'^ — > fi^ fJ,~ is highly suppressed in the 
standard model (SM) due to its flavor changing neutral 
current (FCNC) nature. FCNC decays can only proceed 
in the SM through higher-order diagrams as shown in 
Fig. [T] This decay is further suppressed due to the re- 
quired helicities of the final state muons in the decay of 
the spin zero 5° meson. Recent improvements in the SM 
prediction for the branching fraction B{B^ in- 
clude the effect of the non-zero lifetime difference ATg 
between the heavy and light mass eigenstates of the 
meson P, Q , resulting in an expected branching fraction 
of (3.5±0.2) X 10^^, which is about 10% larger than pre- 
vious calculations [3]. 

Several scenarios of physics beyond the standard model 
(BSM) predict significant enhancements of this decay 
channel [^-Q i making the study of this process a promis- 
ing way to search for new physics. However, it is also 
possible in some BSM scenarios for this decay to be sup- 
pressed even further than the SM prediction 

Previous DO experiment 95% C.L. limits on the 
branching fraction for B'^ — > /x^/x^ include a limit of 
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FIG. 1: The (a) box diagram and (b) electroweak penguin 
diagram are examples of the FCNC processes through which 
the decay B° — > ^"""/i" can proceed. 
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5 X 10^^ from a cut-based analysis using 240 pb^^ of 
integrated luminosity a limit of 1.2 x 10~^ from a 
likelihood ratio method using an integrated luminosity 
of 1.3 fb^^ 9]; and a limit of 5.1 x 10^^ using a Bayesian 
neural network and an integrated luminosity of 6.1 fb~^ 
[l0| . The result presented here uses the full DO dataset 
corresponding to 10.4 fb~^ of collisions and supersedes 
our previous results. 
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Recently, the LHCb Collaboration has presented the 
first evidence for this decay, at a branching fraction con- 
sistent with the SM prediction [ll|. Previous to this re- 
sult, the most stringent 95% C.L. limits on this decay 
came from the LHCb ^IJ], CMS ^13], and ATLAS ^ 
Collaborations, which quote limits of B{B'^ — > ) < 

4.5 X 10"^ 7.7 X 10-^ and 22 x 10"^ respectively. The 
CDF Collaboration sees an excess over background cor- 
responding to a branching fraction of (18^^) x 10~^ and 
to a 95% C.L. upper limit of 40 x lO'^ 



II. THE DO DETECTOR 

The DO experiment collected data at the Fermilab 
Tevatronpp Collider at y^=1.96 TeV from 2001 through 
the shutdown of the Tevatron in 2011, a period referred 
to as Run II. 

The DO detector is described in detail elsewhere [l6j . 
For the purposes of this analysis, the most important 
parts of the detector are the central tracker and the muon 
system. The inner region of the DO central tracker con- 
sists of a silicon microstrip tracker (SMT) that covers 
pseudorapidities I??] < 3 [l3]- In the spring of 2006, an 
additional layer of silicon (Layer 0) was added close to the 
beam pipe J^] . Since the detector configuration changed 
significantly with this addition, the DO dataset is divided 
into two distinct periods (Run Ila and Run lib) , with the 
analysis performed separately for each period. Moving 
away from the interaction region, the next detector sub- 
system encountered is the DO central fiber tracker (CFT), 
which consists of 16 concentric cylinders of scintillating 
fibers, covering \tj\ < 2.5. Both the SMT and CFT are 
located within a 2 T superconducting solenoidal magnet. 
The DO muon system is located outside of the finely seg- 
mented liquid argon sampling calorimeter. The muon 
system consists of three layers of tracking detectors and 
trigger scintillators, one layer in front of 1.8 T toroidal 
magnets and two additional layers after the toroids. The 
muon system covers \r]\ < 2. 

The data used in this analysis were collected with a 
suite of single muon and dimuon triggers. 
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fj,~^fj, candidates are identified by selecting 



III. 



ANALYSIS OVERVIEW 



This analysis was performed with the relevant dimuon 
mass region blinded until all analysis procedures were 
final. Our dimuon mass resolution is not sufficient to 
separate B'^ — > /i^/i^ from B^ — > ^I'^fi^, but in this 
analysis we assume that there is no contribution from 
B'^ — >■ since this decay in expected to be sup- 

pressed with respect to 5° — > i-i^fJ.^ by the ratio of the 
CKM matrix elements \Vtd/Vts\'^ « 0.04 ;i9]. The most 
stringent 95% C.L. limit on the decay B^ — iJ,^fJ.~, which 
is from the LHCb experiment [ll[, is B{B'^ < 
9.4 X 10-1°. 



two high-quality muons of opposite charge that form a 
good three-dimensional vertex well-separated from the 
primary pp interaction due to the relatively long lifetime 
of the 5° meson fui] . A crucial requirement for this anal- 
ysis is the suppression of the large dimuon background 
arising from semileptonic b and c quark decays. Fig- 
ure [5] shows a schematic diagram of the signal decay and 
the two dominant background processes. Backgrounds in 
the dimuon effective mass region below the mass are 
dominated by sequential decays such as h ^ ji^ vc with 
c — > /i+z/X, as shown in Fig. 2(b)[ Backgrounds in the 



dimuon mass region above the 5° mass are dominated 
by double semileptonic decays s uch as 5(c) ^i^vX and 
h{c) — )■ /i+z^A, as shown in Fig. 2(c) For both of these 



backgrounds, the muons do not form a real vertex, but 
the tracks can occasionally be close enough in space to 
be reconstructed as a "fake" vertex. 



Figure [2] illustrates the differences between signal and 
background that we exploit as a general analysis strat- 
egy. The dimuon system itself should form a good vertex 
consistent with the decay of a single particle originating 
from the pp interaction vertex. The B^ candidate should 
have a small impact parameter with respect to the pri- 
mary pp interaction vertex, while the individual muons 
should in general have fairly large impact parameters. 
In addition to quantities related to the dimuon system. 
Fig. [2] illustrates that the environment surrounding the 
S° candidate should be quite difi'crent for signal com- 
pared to backgrounds. The dimuon system for the sig- 
nal should be fairly well isolated, while the fake dimuon 
vertex in background events is likely to have additional 
tracks and additional vertices nearby. No single variable 
is able to provide definitive discrimination against these 
backgrounds, so we use a multivariate technique as de- 
scribed in Sec. IVIII to exploit these differences between 
signal and background. 



In addition to dimuon backgrounds from semileptonic 
heavy quark decays, there are peaking backgrounds aris- 
ing from B1 hh or 5° — ^ hh where hh can be AT AT, Kit 
or TTTT. Of these, B° — )■ AT AT is the dominant contribution. 
The AT or TT mesons can be misidentified as a muon by 
decay in fiight AT/tt or by penetrating far enough 

in the detector to create hits in the muon system. For 
these decays to be misidentified as signal, both hadrons 
must be misidentified as a muon, but since the decay we 
are looking for is rare, B^/B^ — >• hh decays constitute a 
background of magnitude similar to that of the expected 
signal. 



The number of B*^ — ^ fJ'^ f^^ decays expected in our 
dataset is determined from analysis of the normalization 
decay channel B^ — > J/tpK^, with J/ip as 
described in detail in Sec. IVIl 





(a) 



(b) 



(c) 



FIG. 2: (color online) Schematic diagrams showing (a) the signal decay, tJ-'^ /J- , and main backgrounds: (b) sequential 

decay, b — ^ cfi~ followed by c — >■ /i^, and (c) double semileptonic decay, 6 — )■ and b — )■ fj,'^ . 



IV. 



MONTE CARLO SIMULATION 



Detailed Monte Carlo (MC) simulations for both the 
— )■ iJ,'^fJ.~ signal and the — J/ipK^ normalization 
channels are obtained using the pythia [23| event gen- 
erator, interfaced with the evtgen [21| decay package. 
The MC includes primary production of bb quarks that 
are approximately back-to-back in azimuthal angle, and 
also includes gluon splitting g ^ bb where the gluon may 
have radiated from any quark in the event. The latter 
leads to a relatively collimated bb system that produces 
the dominant background when both b and b quarks de- 
cay semileptonically to muons. 

The detector response is simulated using GEANT [l^ 
and overlaid with events from randomly collected pp 
bunch crossings to simulate multiple pp interactions. A 
correction to the MC width of the dimuon mass distri- 
bution is determined from J/ip ^ decays in data, 
and this correction is then scaled to the mass re- 
gion. The — )• mass distribution in the MC is 
well described by a double Gaussian function with the 
two means constrained to be equal, but with the widths 
(ui and (72 ) and relative fractions determined by a fit to 
the corrected mass distribution. The average width is 
aav = /fi + (1 — /)o'2=125 MeV, where / is the fraction 
of the area associated with cri . 

We measure the trigger efficiencies in the data using 
events with no requirements other than a pp bunch cross- 
ing (zero-bias events) or events requiring only an inelastic 
pp interaction (minimum-bias events). The MC gener- 
ation does not include trigger efficiencies, but the MC 
events are reweighted to reproduce the trigger efficiency 
as a function of the muon transverse momentum (pr)- In 
addition, the MC events are corrected to describe the pr 
distribution of B mesons above the trigger threshold, as 
determined from J/ipK^ decays. Since the trigger 

conditions changed throughout the course of Run II, the 
Pt corrections are determined separately for five different 
data epochs, with each epoch typically separated by an 
accelerator shut-down of a few months' duration. Fig- 



ure |3] compares data and MC for several pt distributions 
in the normalization channel, after these corrections. The 
background components in the B^ distributions are re- 
moved by a side-band subtraction technique, that is, by 
subtracting the corresponding distributions from events 
above and below the B^ mass region. As can be seen in 
Fig. 131 the Pt distributions in the MC simulation and nor- 
malization channel data are generally in excellent agree- 
ment. Figure [3] shows a single data epoch, but all data 
epochs show similar agreement. 

In addition to the signal MC, we also study the — >■ 
KK background using a sample of MC events that con- 
tains about six times the expected number of such events 
in our data sample. 



V. 



EVENT SELECTION 



The 5^ candidate events selected for further study are 
chosen as follows. We select two high-quality, oppositely- 
charged muons based on information from both the cen- 
tral tracker and the muon detectors. The primary vertex 
(PV) of each pp interaction is defined using all available 
well-reconstructed tracks and constrained by the mean 
beam-spot position in the transverse plane. If a bunch 
crossing has more than one pp interaction vertex, we en- 
sure that both muons are consistent with originating from 
the same PV. Tracks reconstructed in the central tracker 
are required to have at least two hits in both the SMT 
and CFT detectors. These tracks are extrapolated to the 
muon system, where they are required to match hits ob- 
served in the muon detectors. Each muon is required to 
have transverse momentum px > 1.5 GeV and to have 
pscudorapidity I77I < 2. Both muons arc required to have 
hits in the muon detectors in front of the toroids, and 
at least one of the muons must also have hits in at least 
one of the muon layers beyond the toroids. To reduce 
combinatorial backgrounds, the two muons must form a 
three-dimensional vertex with x^/c'o/ < 14. The dimuon 
vertex is required to be well separated from the PV by 
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FIG. 3: (color online) Comparison of pr distributions for data and MC simulation, for the normalization channel — >■ J/ipK^, 
in a single data epoch, (a) for the higher-pT (leading) muon, (b) lower-pr (trailing) muon, (c) J/ip, (d) kaon, and (e) meson. 
All distributions are normalized to unit area. 




examining the transverse decay length. The transverse 
decay length Lt is defined as Lt = It ■ PtI\pt\, where 
the vector It is from the PV to the dimuon vertex in 
the transverse plane, and pt is the transverse momentum 
vector of the dimuon system. The quantity a^^ is the un- 
certainty on the transverse decay length determined from 



track parameter uncertainties and the uncertainty in the 
position of the PV. To reduce prompt backgrounds, the 
transverse decay length significance of the dimuon ver- 
tex, Lt/<JLti must be greater than three. Events are 
selected for further study if the dimuon mass Ai^^ is be- 
tween 4.0 GeV and 7.0 GeV. These criteria are intended 
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to be fairly loose to maintain high signal efBciency, with 
further discrimination provided by the multivariate tech- 
nique discussed in Sec. IVIII 

The normalization channel decays — J/ij^K^ with 
J/tp ^ fJ-^ fJ'~ are reconstructed in the data by first find- 
ing the decay J/V' — > tJ-^fJ'~ and then adding a third 
track, assumed to be a charged kaon, to the dimuon ver- 
tex. The selection criteria for the signal and normaliza- 
tion channel are kept as similar as possible. In addition 
to the above requirements on the muons, we require the 
if ^ to have pT > 1 GeV and \ri\ < 2, and we require the 
three-track vertex to have x^/dof < 6.7. In the normal- 
ization channel the dimuon mass is required to be in the 
J/^ mass region, 2.7 GeV < < 3.45 GeV. 



VI. DETERMINATION OF THE SINGLE 
EVENT SENSITIVITY 

To determine the number of B'^ — > /i^/i^ decays 
we expect in the data, we normalize to the number of 
B^ — )■ J/ipK^ candidates observed in the data. The 
number of B^ J/tpK^ decays is used to determine the 
single event sensitivity (SES), defined as the branching 
fraction for which one event is expected to be present in 
the dataset. The SES is calculated from 



SES 



b\b^ 



£(B±) fjb^B^) 



In this expression N{B^) is the number of B± J/i/jK^ 
decays observed in the data, as discussed below. The 
efficiency for reconstructing the normalization channel 
decay, e{B'^), and the signal channel, €{B^), are deter- 
mined from MC simulations as discussed in more detail 
below. The fragmentation ratio f{b — B^)/f{b — )■ B'^) 
is the relative probability of a & quark fragmenting to a 
B^ compared to a We use the "high energy" average 
f{b B°^)/f{b B±) = 0.263 ± 0.017 provided by the 
Heavy Flavor Averaging Group [l^ for the 2012 Particle 
Data Group compilation uM, w hich is consistent with 
other recent measurements [24| . The product of the 
branching fractions — > J/^K'^)xB{J/^ M^A* ) 
is (6.01 ±0.21) X 10-5 ig]. 

Figure |3] shows the normalization channel mass distri- 
bution, M ii~ K) , for the entire Run II dataset. The 
mass distribution is fitted to a double Gaussian function 
to model the normalization channel decay and an ex- 
ponential function to model the dominant background. 
A hyperbolic tangent threshold function is also included 
in the fit to model partially reconstructed B meson de- 
cays, primarily B^ — >■ J/ipK^*. A possible contribution 
from B^ — >■ J/ipTT^ is also included in the fit, although 
this contribution is not statistically significant and is not 
shown in the Fig. |4l Systematic uncertainties on N{B^) 
are determined from variations in the mass range of the 
fit, the histogram binning, and the background model. 
An additional systematic uncertaintity on N{B^) is due 
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FIG. 4: (color online) Invariant mass distribution for the 
normalization channel — >■ J/^pK^ for the entire Run 
II dataset. The full fit is shown as the solid line, the 
— > J/tpK^ contribution is shown as the dashed line, the 
exponential background is shown as the dotted line, and the 
contribution from partially reconstructed B meson decays is 
shown as the dot-dash line. 



to the candidate selection. If an event has more than 
one B^ — >■ J/^pK^ candidate, we retain only the candi- 
date with the best vertex ■ This choice results in fewer 
overall reconstructed B^ — >• J/^pK^ decays but also less 
background. To determine the systematic effect due to 
this choice, we have reconstructed B^ -> Jj-pK^ decays 
in two of the five data epochs retaining all candidates. 
The SES depends on the ratio N{B^)/t{B^), and we 
find that this ratio varies at most 2.2%, which we take 
as an additional systematic uncertainty on N{B^). We 
observe a total of (87.4±3.0) x 10^ B± J/ipK^ decays 
in the full dataset, where the uncertainty includes both 
statistical and systematic effects. 

The ratio of reconstruction efficiencies that enters into 
the SES is determined from MC simulation. One source 
of systematic uncertainty in the efficiency ratio arises 
from the trigger efficiency corrections applied to the MC, 
as described in Sec. lIVI The variation in these corrections 
over data epochs with similar trigger conditions allows 
us to set a 1.5% systematic uncertainty on the efficiency 
ratio due to this source. An additional systematic un- 
certainty arises from the efficiency for finding a third 
track. There could be a data/MC discrepancy in this 
efficiency which will not cancel in the ratio. We evaluate 
this systematic uncertainty by comparing the efficiency 
for finding an extra track in data and MC in the four- 
track decay B^ J/tpK^* with K°* Ktt and in the 
three-track normalization channel decay B^ — ^ J/tpK^. 
From this study, we determine that the data/MC effi- 
ciency ratio for identifying the third track varies with 
data epoch but is on average 0.88 ± 0.06, where the un- 
certainty includes statistical uncertainties from the fits 
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used to extract the number of signal events, and sys- 
tematic uncertainties estimated from fit variations. The 
efficiency for B"^ reconstruction is adjusted in each data 
epoch for this track-finding efficiency correction. The re- 
construction efficiency ratio e{B^)/e{B^) is determined 
to be (13.0 ± 0.5)% on average, but varies over the dif- 
ferent data epochs by about 1.0%. The efficiency for the 
B"^ — >• J/^pK^ decay is impacted by the softer pr dis- 
tribution of the muons in the three-body decay as weh 
as the fairly hard {pt > 1 GeV) cut on the pr of the 
kaon, and the candidate selection which retains only the 
three-track candidate with the best vertex x^- 

When all statistical and systematic uncertainties are 
taken into account, the SES is found to be (0.336 ± 
0.029) X 10~^ before the multivariate selection, yielding 
a SM expected number of i?g — )• events of 10.4 ± 

1.1 events in our data sample. 



VII. MULTIVARIATE DISCRIMINANT 

A boosted decision tree (BDT) algorithm, as imple- 
mented in the tmva package of ROOT [2^|, is used 
to differentiate between signal and the dominant back- 
grounds. The BDT is trained using MC simulation 
for the signal and data sidebands for the background. 
The data sidebands include events in the dimuon mass 
range 4.0-4.9 GeV (low-mass sidebands) and 5.8-7.0 GeV 
(high-mass sidebands), with all selection cuts applied. 
The low-mass sidebands are dominated by sequential de- 
cays, illustrated in Fig. 2(b)[ while the the high-mass 
sidebands are dominated by double B hadron decays, as 
illustrated in Fig. 2(c) We therefore train two BDTs to 
separately discriminate against these two backgrounds. 
Each BDT discriminant uses 30 variables that fall into 
two general classes. 

One class of variables includes kinematic and topolog- 
ical quantities related to the dimuon system. These vari- 
ables include the pointing angle, defined as the angle be- 



tween the dimuon momentum vector p(/i+/i^) and the 
vector from the PV to the dimuon vertex. The dimuon 
Pt and impact parameter, as well as the pt values of the 
individual muons and their impact parameters, are also 
used as discriminating variables. As examples of dimuon 
system variables that discriminate between signal and 
background. Fig. 5(a) shows the impact parameter sig- 
nificance (impact parameter divided by its uncertainty) 
of t he -B" candidate for signal MC and background, and 
Fig. 5(b) shows the minimum impact parameter signif- 
icance for the individual muons, that is, the smaller of 
the two values. 

A second general class of variables used in the BDT dis- 
criminants includes various isolation-related quantities. 
Isolation is defined with respect to a momentum vector p 
by constructing a cone in azimuthal angle (j) and pseudo- 
rapidity r] around the momentum vector, with the cone 
radius defined by 72. = ^/Arf + Acj)^. The isolation X is 
then defined as I = Pt/[pt + PT(cone]) where pT (cone) 



is the scalar sum of the pt of all tracks (excluding the 
track of interest) with TZ less than some cut-off value, 
chosen to be 7?. = 1 in this analysis. For a perfectly iso- 
lated track (that is, no other tracks in the cone), I — 1. 
Figure [2] shows that background events are expected to 
be less isolated than signal events. For maximum sig- 
nal/background discrimination, we define isolation cones 
around the dimuon direction and around each muon indi- 
vidually. From simulation studies, we find that for back- 
ground events, the two muons are often fairly well sepa- 
rated in space, so using individual isolation cones around 
each muon adds discriminating power. Figure [5] com- 
pares signal MC and data sidebands for two examples of 
isolation variables. 

We also search for additional vertices near the dimuon 
vertex using two different techniques. As illustrated by 
Fig. [21 in background events the muons often form a good 
vertex with another charged track. We try to reconstruct 
such vertices using tracks that are associated with the 
same PV as the dimuon pair, which have an impact pa- 
rameter with respect to the PV of at least 30 microns, 
and which have an impact parameter significance of at 
least 3.0. If a track satisfying these requirements forms a 
vertex with one of the muons with a vertex /dof < 5.0, 
we consider this an additional vertex. Additional tracks, 
satisfying the same requirements as above, can be in- 
cluded in this vertex if they do not increase the vertex 
by more than 5.0. This procedure is carried out with 
both muons, allowing for the possibility of finding an ad- 
ditional vertex with either or both of the muons. We also 
attempt to reconstruct additional vertices using tracks 
that have an impact parameter significance with respect 
to the dimuon vertex of less than 4.0. We allow these ver- 
tices to include or not include one of the muons. When an 
additional vertex is successfully reconstructed, the vertex 
X^ , the invariant mass of the particles included in the ver- 
tex, and the vertex pointing angle are used as discrimi- 
nating variables in the BDTs. In the case where no such 
vertices are found, these variables are set to nonphysical 
values. We find that, for the background sidebands, at 
least one additional vertex is reconstructed 80% of the 
time, while for the signal MC, one or more additional 
vertices are found 40% of the time. 

To verify that the MC simulation is a good represen- 
tation of the data, we compare the sideband-subtracted 
normalization channel data with the normalization chan- 
nel MC. Figure [7] compares the normalization channel 
data and the MC simulation for the B^ meson impact 
parameter significance and the minimum muon impact 
parameter significance. Figure [5] shows the same com- 
parison for the dimuon and individual muon isolation 
variables. We check all 30 variables used in the multi- 
variate discriminant to confirm good agreement between 
data and MC for the normalization channel. 

We make additional requirements on both the data 
sidebands and the signal MC before events are used in 
the BDT training. These requirements include dimuon 
Pt > 5 GeV and the cosine of the dimuon pointing angle 
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FIG. 5: (color online) Comparison of signal MC and background sideband data for (a) the candidate impact parameter 
significance and (b) the minimum muon impact parameter significance. All distributions are normalized to unit area. 
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FIG. 6: (color online) Comparison of signal MC and background sideband data for (a) isolation defined with respect to the 
dimuon system and (b) for the average of the two isolations defined with respect to the individual muons. All distributions are 
normalized to unit area. 



> 0.95. These requirements are 78% efficient on average 
in retaining signal events but exclude about 96% of the 
background. We find a significant enhancement in back- 
ground rejection from the BDT discriminants using these 
additional requirements before BDT training. These re- 
quirements are (93 ±1)% efficient for the normalization 
mode MC, and (91 ± 3) % efficient for the normahzation 
mode data. 

To improve the statistics available for training, the 
data epochs are combined and used together to train the 
BDT. The signal MC samples for each data epoch are 
combined according to the integrated luminosity for each 
epoch into a common sample. The data sidebands and 
signal MC are then randomly split into three samples. 
Sample A, with 25% of the events, is used to train the 
BDTs. Sample B, with 25% of the events, is used to opti- 
mize the selections on the BDT response. Sample C, with 



50% of the events, is used to determine the expected sig- 
nal (from the MC sample) and background (from the data 
sideband sample) yields. The resuhs of the TMVA BDT 
training for both BDTl, trained to remove sequential de- 
cay backgrounds, and BDT2, trained to remove double 
semileptonic B meson decays, can be seen in Fig. [SI We 
check that the response of both BDT discriminants is in- 
dependent of dimuon mass over the relevant mass range. 
The optimal BDT selections are determined by optimiz- 
ing the expected limit on B{B^ — >■ jX^ fi^) and are found 
to be BDTl > 0.19 and BDT2 > 0.26. 
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FIG. 7: (color online) Comparison of normalization channel MC and sideband-subtracted data for (a) B impact parameter 
significance and (b) the minimum muon impact parameter significance. All distributions are normalized to unit area. 
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FIG. 8: (color online) Comparison of normalization channel MC and sideband-subtracted data for (a) dimuon isolation and 
(b) the average of the two individual muon isolations. All distributions are normalized to unit area. 



VIII. BACKGROUND ESTIMATES AND 
EXPECTED LIMIT 



Figure [TU] shows the blinded dimuon mass distribu- 
tions before (Fig. |10(a)[ ) and after (Fig. |10(b)| the BDT 
selection cuts for the half of the data (sample C) used 
to estimate the number of background events. The sig- 
nal window within the blinded region is chosen to maxi- 
mize the signal significance S/ \/S + B, where S is the 
expected number of signal events as determined from 
the SM branching fraction, and B is the expected back- 
ground. The number of expected background events is 
determined by a likelihood fit to the data in the side- 
band regions, which is then interpolated into the blinded 
region. The optimum signal region is determined to be 
±1.6cr centered on the B'^ mass, where a — 125 MeV 
is the average width of the double Gaussian used to fit 



the dimuon mass distribution in the — )■ fi^fJ,^ MC 
sample. The blinded region includes a control region of 
width 2(7 on each side of the signal window. While only 
half of the dataset is shown, the numbers of expected 
background events quoted in Fig. [TU] are scaled to the 
full dataset. The numbers given are for the estimated 
dimuon background events in the signal region. 

The efficiency for retaining signal events when all BDT 
selections are applied, including the pre-training cuts (see 
Sec. IVIip and the final BDT cuts, is determined to be 
0.12 ± 0.01, where the error is due to variation over 
the different data epochs. We obtain a final SES of 
(2.8 ± 0.24) X 10^^, corresponding to an expected number 
of signal events at the SM branching fraction of 1.23 ± 
0.13. For the dimuon background the expected number 
of events in the signal and control regions is determined 
by applying a log likelihood fit to the dimuon mass dis- 
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FIG. 9: (color online) Distributions of the BDT response for (a) BDTl, trained against sequential decay backgrounds, and (b) 
BDT2, trained against double B decay backgrounds. MC simulation is used for the signal, while the data sidebands are used 
for the backgrounds. The vertical lines denote the BDT selection cuts in the analysis. All distributions are normalized to unit 
area. 
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FIG. 10: (color online) Dimuon mass distribution for sample C (a) before and (b) after BDT selection cuts. The edges of the 
blinded region are denoted in (b) by the vertical lines at 4.9 and 5.8 GeV, and the shaded area denotes the signal window. The 
curves are fits to an exponential plus constant function. The numbers of expected background events are determined from an 
interpolation of the fit into the signal window and scaled to the fuU dataset. 



tribution using an exponential plus constant functional 
form. The fit is performed excluding the blinded region, 
and the resulting fit is interpolated into the signal and 
control regions. This procedure yields an expected num- 
ber of dimuon background events in the signal region of 
4.0 ± 1.5 events, where the uncertainty is only statisti- 
cal. The corresponsing estimate for the expected num- 
ber of events in the control region is 6.7 ± 2.6 events, 
with 5.3 ± 1.9 events expected in the lower control region 
(dimuon masses from 4.9 to 5.15 GeV), and 1.4 ± 1.4 
events in the upper control region (dimuon masses from 
5.55 to 5.8 GeV). To determine the systematic uncer- 
tainty on the background estimate, we use other func- 
tional forms for the background fit, resulting in a sys- 



tematic uncertainty of 0.6 events. Adding the statistical 
and systematic errors in quadrature yields a final dimuon 
background estimate in the signal region of 4.0 ± 1.6 
events and 6.7 ± 2.7 events in the control region. 

In addition to the dimuon background, there is back- 
ground from the decay mode — > K^K~, which has 
kinematics very similar to the signal. We estimate this 
background by scaling the expected number of signal 
events by the appropriate branching fractions (19J and 
by the ratio of the probabilities for both K mesons to 
be misidentified as muons, e{KK — ^ MM); to the proba- 
bility that two muons are correctly identified as muons, 
e(/Lt/x — > mm)- The probability that a K meson is misiden- 
tified as a muon is measured in the data using — > Kir 
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decays. We assume that the probabihty of two K mesons 
being misidentified as muons is the product of the proba- 
bihties for each individual K meson. The muon identifi- 
cation efficiency is measured in the data from J/ip — > /i/i 
decays. The efficiency ratio t{KK — >■ fifi)/e{fifi — ?> fj,fi) 
is determined to be (3.0 ± 1.1) x 10^^. We estimate the 
background from — )■ KK decays to be 0.28 ± 0.11 
events. We also find a consistent estimate of this back- 
ground using a -Bg — )■ KK MC sample. Other possible 
peaking backgrounds such as — Kir and JS^ — )■ Kit 
are negligible due to the combination of smaller branch- 
ing fractions and a tt — > /i misidentification probability 
that is more than a factor of 10 smaller than the K ^ fi 
misidentification probability in the DO detector. 

We set an upper limit on the 5° — > /^^/i^ branching 
fraction using the CLs, or modified frequcntist method 
[2^. A Poisson likelihood function is used to calculate 
the number of signal events which would occur with a 
probability of 0.05 (for a 95% CL upper confidence limit) 
when A^obs data events are observed in the signal region 
with a known expected number of background events. 

The limit calculation includes a convolution over prob- 
ability distributions representing the uncertainties in the 
background and the signal. The uncertainty in the 
—J' KK peaking background is assumed to be Gaus- 
sian. The dimuon background in the signal region is es- 
timated by the fit shown in Fig. |10(b)[ The normalized 
likelihood function from this fit is used as the probability 
distribution function for the dimuon background in the 
convolution. The expected number of signal events, as- 
suming the SM branching fraction, is 1.23 ± 0.13 events, 
with the uncertainty assumed to be Gaussian. The total 
expected background is 4.3 ± 1.6 events. Weighting each 
possible outcome by its Poisson probability yields an ex- 
pected 95% C.L. upper limit on the branching fraction 
6(^0 ^ ^+fi-) of 23 X 10-9. 

Upon unblinding, a total of nine events is found in 
the control region above and below the signal region, as 
shown in Fig. 1111 Six events are found in the control re- 
gion below the signal window, and three events are found 
in the control region above the signal window. This num- 
ber of events and their distribution within the control re- 
gions is in agreement with the expected number of back- 
ground events interpolated from the data sidebands. As 
seen in Fig.[TTJ three events are found in the dimuon mass 
signal window, in agreement with the expected back- 
ground and also with the expected signal -|- background. 
We check that the properties of all events found in the 
blinded region, such as the px of the dimuon system, the 
Pt of the individual muons, the dimuon pointing angle, 
and the various isolation quantities, are consistent with 
expectations. We also check that, as the BDT cuts are re- 
laxed, the number of events observed in the signal region 
remains in good agreement with expectations, as shown 
in Fig. [12] 

The observed number of events and the SES allow us to 
set a 95% C.L. upper limit 6(5° ^ M"^M") < 15 x 10"^. 
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FIG. 11: (color online) Dimuon mass distribution in the 
blinded region for the full dataset after BDT selections are 
applied. The curve shows the fit from Fig. 10(b) used to de- 



termine the expected number of background events. The SM 
expectation for signal events multiplied by five is also indi- 
cated. The vertical lines mark the edge of the signal window. 
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FIG. 12: (color onfine) Expected number of events and ob- 
served number of events in the signal region as the two BDT 
cuts are relaxed in parallel. The expected number of events 
includes the dimuon background, the — ^ KK background, 
and the expected number of signal events. The upper hori- 
zontal axis shows the cut applied to BDTl, while the lower 
horizontal axis shows the cut applied to BDT2. 



IX. SUMMARY 



In summary, we have searched for the rare decay 
B^ — )■ in the full DO dataset. We employ two 

Boosted Decision Tree multivariate discriminators, one 
trained to discriminate against sequential decays b{b) — 
cii~ {cfi^)X followed by c(c) — ijl^{^~')X and the other 
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to discriminate against double semileptonic decays b — 
li~X and b — >■ /i+X. The sidebands around the signal re- 
gion in the dimuon invariant mass distribution are used 
to estimate the dominant backgrounds. The expected 
limit is 23 xlO~^, and the expected background (signal) 
in the signal region is 4.3 ± 1.6 (1.23 ± 0.13) events. We 
observe three events in the signal region consistent with 
expected background. The probability that the back- 
ground alone (signal -I- background) could produce the 
observed number of events or a larger number of events 
in the signal region is 0.77 (0.88). We set an observed 
95% C.L. upper limit ^(5° ^ M+M~) < 15 x lO'^. This 
upper limit supersedes the previous DO 95% C.L. limit 
of 51 xlO~^ [iO], and improves upon that limit by a fac- 
tor of 3.4. The improvement in the expected limit is a 
factor of 1.7 greater than the improvement that would 
be expected due to increased luminosity alone. The ad- 
ditional improvement arises from the inclusion of several 



isolation-type variables in the multivariate discriminants 
and in the use of two separate discriminants to distin- 
guish backgrounds from sequential b quark decays and 
double b quark decays. This result is the most stringent 
Tevatron limit and is compatible with the recent evidence 
of this decay produced by the LHCb experiment [lTj |. 
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